In this tutorial, I will show you how to setup R, perform some basic commands, and explore the data set we discussed in class. The purpose of this tutorial is to get you started with R and to help you understand the properties of the data set. You will also learn some of the most-commonly used functions in R. I don’t expect you to be able to write R code from scratch (at this stage), so I will give you help along the way.
I wrote and compiled this document in R using Markdown (R can do lots of things beyond just analysing data). Via this document, I will walk you through a number of steps. For each of these steps, I will provide some background and basic instructions, then I will provide the code as well as the output generated by this code. You can copy and paste the code into your own R script. If you have set everything up correctly on your own machine, when you run this code, you should see the same (or similar) output to that which is reported in this document.
Before we dive into the data, you need to install R and RStudio. Both are freely available online. This installation process is straight forward and there is lots of trouble-shooting advice and guidance online if you run into any problems (feel free to email me if you have any issues that you can’t solve after a few Google searches).
To install R, go to https://cloud.r-project.org/ Follow the prompts and be sure to install the latest version.
To install RStudio, go to https://rstudio.com/products/rstudio/download/ Follow the prompts and be sure to install the latest version.
With both programs installed, you should now open RStudio. On first glance, you will have a lot to take in. However, RStudio is less complex than it initially looks. The interface consists of four main panels: the source editor, the console, the environment pane and the browser pane.
For most applications, you will work with RStudio in the following manner. You will write and run your code from a script in the source editor. Output from your code is repoted in the console. If you created an object with your code (e.g., data frame/tibble, function, etc), you can interact with this object in the environment pane. For instance, you can open and inspect your data from here. Finally, you can view your files/directory and plots in the browser pane.
The following video provides a useful overview of the RStudio interface (as well as information on how to install R and RStudio - helpful if you ran into trouble on the previous step):
With the installation out of the way, you are now ready to start working in R. First, you need to set up your directory. R needs to pull and push files from and to somewhere. This somewhere is your working directory. You can set up your directory in a number of different ways. I am going to have you to do so by running your first piece of R code.
To begin, you need to open an R script by using the menu that drops down from the top-left icon:
In the script you have opened, you now need enter a command to set the working directory. For simplicitly, you can set your directory to the desktop. If you are working on a Mac, you will need to type something like the following:
setwd("/Users/username/Desktop")
If you are working on a Windows computer, you will need to type something like the following:
setwd("C:\Users\username\Desktop")
You will need to adapt the code I provided above to match the naming conventions on your machine (i.e., substitute in your own username, etc). On my machine - a Mac - I type and run the following line of code:
As shown in the screenshot above, to run a chunk of code in the editor, simply select the line/s of code you want to run and then click on the ‘Run’ icon in the source editor pane (alternatively, if you are on a Mac, you can highlight the code and hit Ctrl+Enter). You will see the same line of code show up in the console pane below the source editor (if it is accompanied by an error message, something has gone wrong and your working directory will not be set to the desktop).
Having set your working directory, you now want to store the raw data file we will be working with on the desktop.
First, you need to download the AFL data set from the course website. The data set can be accessed here: ….
Next, you need to move this file to the desktop on your computer. Once you have done this, the data set should show up as a csv file in the browser pane of your RStudio interface.
You want to see something that looks like the following:
As I mentioned in my lecture, you will primarily be working with the ‘tidyverse’ set of packages. As such, you will need to install and load the tidyverse package. Installing packages in R is straight-forward and follows the same basic convention: specify the name of the package you want to install in the install.packages function.
To install the tidyverse package, you will need to type the following line of code in your script and hit run (just like I showed you above when you set the working directory):
install.packages("tidyverse")
Once you have run this line of code, you should see some output show up in the console. This output will identify the package you are downloading and provide you with some additional information. Now that you have installed the tidyverse package, you now need to load this package for use. To do so, you need to run the following line of code:
library(tidyverse)
Great. You are now ready to get your hands dirty. One final thing I will mention is that you do not need to install a package every time you want to use it. Once a packaged has been installed, it will remains on your local machine. However, you do need to load installed packages each time your start a new session in R. A good habit to get into is to start each of your scripts with a chunk of code that loads all the packages that you commonly use.
You can now load the AFL data set. To do so, copy the following line of code to you script and run it in from the source editor:
AFL_data_set <- read_csv("AFL_data_set.csv")
Most of the code you will write in R follows this same basic structure: you are using a function - read_csv( ) - to transform an input (our raw data from the csv file) - into an R object - the ‘tibble’ (i.e., the tidy-version of a data frame), ‘AFL_data_set’.
If all goes as planned, you should see ‘AFL_data_set’ show up in the environment pane:
If you click on ‘AFL_data_set’, a spreadsheet-tyle viewer will display the data in the source editor:
Alternatively, to get a snapshot of the first ten rows of the data set, as well as discription of its structure and contents, you can simply type the name of the object in your script and hit run:
AFL_data_set
## # A tibble: 1,474 x 14
## Game_ID Player_ID Position GameTotalMins GameTotalDistan… Disposals
## <dbl> <dbl> <chr> <dbl> <dbl> <dbl>
## 1 1 2 DEFENCE 108. 13.3 17
## 2 1 5 DEFENCE 115. 14.2 21
## 3 1 6 MIDFIELD 90.1 12.3 12
## 4 1 7 DEFENCE 110. 13.2 19
## 5 1 8 FORWARD 109. 15.2 14
## 6 1 13 DEFENCE 116. 13.3 15
## 7 1 14 MIDFIELD 108. 14.3 17
## 8 1 16 MIDFIELD 95.8 11.9 21
## 9 1 21 DEFENCE 108. 13.8 11
## 10 1 22 MIDFIELD 91.8 12.0 14
## # … with 1,464 more rows, and 8 more variables: Disposal_efficiency <dbl>,
## # Goals <dbl>, Tackles <dbl>, Marks <dbl>, Clearances <dbl>,
## # Margin <dbl>, Rainfall_mm <dbl>, Wind_mph <dbl>
You should see some output (a crude table) pop up in your console. What can we learn from this output?
While this output provides a useful overview of our data set, another helpful way to explore our data is to run the follow command, which outputs the first ten rows of the data in a slightly easier-to-read structure:
knitr::kable(AFL_data_set[1:10,])
| Game_ID | Player_ID | Position | GameTotalMins | GameTotalDistance_km | Disposals | Disposal_efficiency | Goals | Tackles | Marks | Clearances | Margin | Rainfall_mm | Wind_mph |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 2 | DEFENCE | 108.5 | 13.2552 | 17 | 58.8 | 0 | 4 | 4 | 2 | 30 | 1.2 | 16 |
| 1 | 5 | DEFENCE | 115.4 | 14.2282 | 21 | 66.7 | 0 | 6 | 4 | 0 | 30 | 1.2 | 16 |
| 1 | 6 | MIDFIELD | 90.1 | 12.2865 | 12 | 66.7 | 0 | 5 | 3 | 2 | 30 | 1.2 | 16 |
| 1 | 7 | DEFENCE | 109.8 | 13.1969 | 19 | 78.9 | 1 | 2 | 2 | 2 | 30 | 1.2 | 16 |
| 1 | 8 | FORWARD | 109.3 | 15.2498 | 14 | 78.6 | 3 | 4 | 2 | 3 | 30 | 1.2 | 16 |
| 1 | 13 | DEFENCE | 116.5 | 13.2598 | 15 | 86.7 | 0 | 1 | 6 | 0 | 30 | 1.2 | 16 |
| 1 | 14 | MIDFIELD | 107.8 | 14.2644 | 17 | 76.5 | 0 | 2 | 5 | 7 | 30 | 1.2 | 16 |
| 1 | 16 | MIDFIELD | 95.8 | 11.8750 | 21 | 71.4 | 3 | 4 | 3 | 6 | 30 | 1.2 | 16 |
| 1 | 21 | DEFENCE | 108.2 | 13.7973 | 11 | 81.8 | 0 | 2 | 2 | 0 | 30 | 1.2 | 16 |
| 1 | 22 | MIDFIELD | 91.8 | 12.0372 | 14 | 71.4 | 1 | 5 | 4 | 2 | 30 | 1.2 | 16 |
To get a better sense of how to interpret the contents of our data set, I will interpret the first row reported above.
Now that you have a handle on our data set, I am going to get you to create some new variables. You will use the function mutate() to create these variables. I will also get you to use pipes (%>%) to string together a pieces of code. You should get in the habit of using pipes. They can help you to compartmentalize blocks of complex code. Pipes also make your code easier to read - something you will appreciate if you ever have to return to and work with a piece of code you wrote in the distant past.
First, I will get you to create a variable that measures running distance in meters per minute of game time. To generate this variable, you need to run the following chunk of code:
AFL_data_set <- AFL_data_set %>%
mutate(Meters_per_min=(GameTotalDistance_km*1000)/GameTotalMins)
Second, I will get you to create a variable that indicates (i.e., takes a value of one) if a player kicked 2+ goals in a game and also ran 14+ km. To generate this variable - which we will call ‘best_on_ground’ - you need to run the following chunk of code:
AFL_data_set <- AFL_data_set %>%
mutate(best_on_ground=ifelse(Goals >= 2 & GameTotalDistance_km >= 14, 1, 0))
To confirm that your code ran as intended, you should take another quick look at your data set to confirm that the new variables now show up:
knitr::kable(AFL_data_set[1:10,])
| Game_ID | Player_ID | Position | GameTotalMins | GameTotalDistance_km | Disposals | Disposal_efficiency | Goals | Tackles | Marks | Clearances | Margin | Rainfall_mm | Wind_mph | Meters_per_min | best_on_ground |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 2 | DEFENCE | 108.5 | 13.2552 | 17 | 58.8 | 0 | 4 | 4 | 2 | 30 | 1.2 | 16 | 122.1677 | 0 |
| 1 | 5 | DEFENCE | 115.4 | 14.2282 | 21 | 66.7 | 0 | 6 | 4 | 0 | 30 | 1.2 | 16 | 123.2946 | 0 |
| 1 | 6 | MIDFIELD | 90.1 | 12.2865 | 12 | 66.7 | 0 | 5 | 3 | 2 | 30 | 1.2 | 16 | 136.3651 | 0 |
| 1 | 7 | DEFENCE | 109.8 | 13.1969 | 19 | 78.9 | 1 | 2 | 2 | 2 | 30 | 1.2 | 16 | 120.1903 | 0 |
| 1 | 8 | FORWARD | 109.3 | 15.2498 | 14 | 78.6 | 3 | 4 | 2 | 3 | 30 | 1.2 | 16 | 139.5224 | 1 |
| 1 | 13 | DEFENCE | 116.5 | 13.2598 | 15 | 86.7 | 0 | 1 | 6 | 0 | 30 | 1.2 | 16 | 113.8180 | 0 |
| 1 | 14 | MIDFIELD | 107.8 | 14.2644 | 17 | 76.5 | 0 | 2 | 5 | 7 | 30 | 1.2 | 16 | 132.3228 | 0 |
| 1 | 16 | MIDFIELD | 95.8 | 11.8750 | 21 | 71.4 | 3 | 4 | 3 | 6 | 30 | 1.2 | 16 | 123.9562 | 0 |
| 1 | 21 | DEFENCE | 108.2 | 13.7973 | 11 | 81.8 | 0 | 2 | 2 | 0 | 30 | 1.2 | 16 | 127.5166 | 0 |
| 1 | 22 | MIDFIELD | 91.8 | 12.0372 | 14 | 71.4 | 1 | 5 | 4 | 2 | 30 | 1.2 | 16 | 131.1242 | 0 |
When you have large data sets, you will often only want to look at a subset or snapshot of your data. In the case of wide data sets (i.e., many columns), you may want to only look at or work with a subset of variables. In the case of long data sets (i.e., many rows), you may want to only look at or work with a subset of observations.
You can use the function select( ) to ‘narrow’ a wide data set (i.e., drop variables). For example, let’s say we want to create a tibble that contains the following four variables from our AFL data set: Player_ID, Game_ID, and the two new variables we just created (Meters_per_min, and best_on_ground). You can run the following code to create this ‘narrow’ data set:
narrow <- AFL_data_set %>%
select(Player_ID, Game_ID, Meters_per_min, best_on_ground)
If we take a quick peek at this new tibble, you can see that it contains only the four variables that we wanted from our original, wider data set:
knitr::kable(narrow[1:10,])
| Player_ID | Game_ID | Meters_per_min | best_on_ground |
|---|---|---|---|
| 2 | 1 | 122.1677 | 0 |
| 5 | 1 | 123.2946 | 0 |
| 6 | 1 | 136.3651 | 0 |
| 7 | 1 | 120.1903 | 0 |
| 8 | 1 | 139.5224 | 1 |
| 13 | 1 | 113.8180 | 0 |
| 14 | 1 | 132.3228 | 0 |
| 16 | 1 | 123.9562 | 0 |
| 21 | 1 | 127.5166 | 0 |
| 22 | 1 | 131.1242 | 0 |
If you want to ‘shorten’ a long data set (i.e, drop observations), you can use the filter( ) command. For example, the following piece of code creates a tibble that only contains observations from our AFL data set for which best_on_ground=1 (i.e., observations where the player scored 2+ goals in the game and ran more than 14 km):
short <- AFL_data_set %>%
filter(best_on_ground==1)
You should then be able to see that this new tibble only contains observations from the AFL data set for which best_on_ground=1:
knitr::kable(short[1:10,])
| Game_ID | Player_ID | Position | GameTotalMins | GameTotalDistance_km | Disposals | Disposal_efficiency | Goals | Tackles | Marks | Clearances | Margin | Rainfall_mm | Wind_mph | Meters_per_min | best_on_ground |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 8 | FORWARD | 109.3 | 15.2498 | 14 | 78.6 | 3 | 4 | 2 | 3 | 30 | 1.2 | 16 | 139.5224 | 1 |
| 2 | 29 | MIDFIELD | 99.4 | 14.4787 | 17 | 82.4 | 2 | 7 | 3 | 3 | -13 | 0.0 | 12 | 145.6610 | 1 |
| 3 | 28 | FORWARD | 101.6 | 14.2604 | 13 | 92.3 | 2 | 4 | 3 | 1 | 69 | 0.0 | 9 | 140.3583 | 1 |
| 3 | 42 | MIDFIELD | 104.2 | 14.3412 | 28 | 71.4 | 2 | 2 | 5 | 5 | 69 | 0.0 | 9 | 137.6315 | 1 |
| 5 | 9 | FORWARD | 108.1 | 14.8046 | 18 | 72.2 | 3 | 4 | 3 | 1 | 48 | 0.0 | 4 | 136.9528 | 1 |
| 5 | 28 | FORWARD | 102.2 | 14.6557 | 14 | 64.3 | 2 | 2 | 4 | 2 | 48 | 0.0 | 4 | 143.4022 | 1 |
| 7 | 40 | FORWARD | 117.2 | 14.2475 | 18 | 77.8 | 3 | 2 | 9 | 0 | 44 | 0.0 | 7 | 121.5657 | 1 |
| 7 | 42 | FORWARD | 113.8 | 14.1694 | 25 | 84.0 | 4 | 3 | 4 | 2 | 44 | 0.0 | 7 | 124.5114 | 1 |
| 8 | 40 | FORWARD | 117.2 | 14.6431 | 13 | 69.2 | 3 | 4 | 6 | 1 | 26 | 3.4 | 1 | 124.9411 | 1 |
| 8 | 42 | MIDFIELD | 111.5 | 14.8916 | 24 | 70.8 | 4 | 3 | 1 | 3 | 26 | 3.4 | 1 | 133.5570 | 1 |
When working with data, one of the very first things you will want to do is summarize the variables of interest in your data set. This is because it is often not feasible to look at the value a variable takes for every observation in the data set (and even if doing so was feasible, it is not clear what you would learn by just ‘eye-balling’ the data). Statistics tells us that a good way to summarize a variable is describe its distribution. I will show you two common, easy-to-interpret ways to do this.
You can calculate the summary statistics of a variable. I will get you to use the summary () function to do this for the variable GameTotalDistance_km:
summary(AFL_data_set$GameTotalDistance_km)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.1829 12.2714 13.1155 12.9939 14.0226 17.0754
A downside of the summary() function is that it lacks many of the statistics we commonly use in economics and data science (e.g, standard deviation, etc). An alternative approach is to use the stargazer() function from the stargazer package (to use this papproach you will need to install and load the stargazed package). Stargazer produces the following output, which provides a compact overivew of the summary statistics for all the variables in our data set (NB Stargazer does not take tibbles as an input, hence why you need to convert the data to a data frame within the function):
stargazer(data.frame(AFL_data_set), type = "html")
| Statistic | N | Mean | St. Dev. | Min | Pctl(25) | Pctl(75) | Max |
| Game_ID | 1,474 | 35.307 | 20.521 | 1 | 17 | 54 | 70 |
| Player_ID | 1,474 | 27.988 | 15.152 | 1 | 15 | 42 | 53 |
| GameTotalMins | 1,474 | 99.827 | 13.892 | 1.320 | 93.422 | 108.295 | 129.520 |
| GameTotalDistance_km | 1,474 | 12.994 | 1.703 | 0.183 | 12.271 | 14.023 | 17.075 |
| Disposals | 1,474 | 17.389 | 7.364 | 0 | 12 | 22 | 48 |
| Disposal_efficiency | 1,474 | 73.396 | 13.012 | 0.000 | 65.875 | 82.325 | 100.000 |
| Goals | 1,474 | 0.624 | 1.023 | 0 | 0 | 1 | 7 |
| Tackles | 1,474 | 3.212 | 2.424 | 0 | 1 | 4 | 18 |
| Marks | 1,474 | 4.102 | 2.464 | 0 | 2 | 6 | 14 |
| Clearances | 1,474 | 1.716 | 2.293 | 0 | 0 | 2 | 13 |
| Margin | 1,474 | 20.258 | 38.755 | -51 | -11 | 42 | 133 |
| Rainfall_mm | 1,474 | 2.821 | 5.169 | 0.000 | 0.000 | 2.800 | 26.200 |
| Wind_mph | 1,474 | 7.430 | 5.267 | 0 | 2 | 11 | 20 |
| Meters_per_min | 1,474 | 130.702 | 9.913 | 93.902 | 124.020 | 137.629 | 173.652 |
| best_on_ground | 1,474 | 0.033 | 0.179 | 0 | 0 | 0 | 1 |
You can also summarize a variable by visualizing its distribution. A common way to do so is to produce a density plot (you should have seen these in your statistics or econometrics courses). A density plot is like a smoothed version of a histogram; it tells you plots the values of the variable on the x-axis and how often those values show up in the data set for that variable on the y-axis.
If you run the following chunk of code, you will get a density plot for the variable GameTotalDistance_km This plot provides captures the full distribution, rather than just a subet of the distribution’s ‘moments’ (.e.g, mean, variance, etc.):
plot(density(AFL_data_set$GameTotalDistance_km), main='Total Distance Run',
xlab='Km')
This plot is pretty informative (we see that a player runs around 13km in a typical game; some players run a bit more, some players run a lot less), but it also masks a lot of cross-sectional variation. For instance, how does the distribution of running distance vary by playing position? With R, this sort of analysis is fairly trivial to execute - just run a piece of code like the following:
plot(density(filter(AFL_data_set, Position == "MIDFIELD")$GameTotalDistance_km), col='red', main='Total Distance Run',
xlab='Km')
lines(density(filter(AFL_data_set, Position == "FORWARD")$GameTotalDistance_km), col="blue")
lines(density(filter(AFL_data_set, Position == "DEFENCE")$GameTotalDistance_km), col="green")
legend("topright", legend=c("Midfield", "Forward", "Defender"),
col=c("red", "blue", "green"), lty=1, cex=0.8)
Most of the time when we work with data, we aren’t just interested in looking at variables by themselves. Instead, we most often want to know how variables can be related to each other. How are variables correlated? How can one variable be used to predict another? How does one variable cause another? To answer these sorts of questions, we need to understand how variables are related (we also need to impose a set of assumptions on the underlying data generating process if we want to make causal claims). While we won’t cover the later in detail in this module, we will teach you how to do the former - i.e., understand whether one variable is associated with another variable.
Fist, you use plots and other visuals to understand whether variables are associated with one another. For example, let’s look at the relationship between the total number of goals kicked by a team and the final margin of the game. As these variables are by definition mechanically related, we expect a strong positive relationship to show up in our plot. To produce a scatter plot of these variables, you can run the following code (NB that before you plot the raw variables, you need to aggreate the player-level data to the team-level, hence the use of the group_by() function in the code):
grouped <- AFL_data_set %>%
group_by(Game_ID) %>%
summarise(total_goals=sum(Goals), margin=mean(Margin))
plot(grouped$total_goals, grouped$margin,
xlab='Total Goals', ylab='Margin')
Although it is not terribly suprising, the plot above shows a strong positive relationship between the total number of goals a team scores and the final margin of the game.
Second, we can quantify the association between two variables by calculating their correlation. R provides a function that allows you to do just this, which we use in the line of code below to look at the association between the total number of goals a team scores and the final margin of the game:
cor(grouped$total_goals, grouped$margin)
## [1] 0.772114
Third, we can also use a regression to quantify the association between two variables. A neat feature of regression is that it also allows us to include additional variables (‘controls’) in the model. This can be espcially powerful if we suspect that a confounding variable is creating (or masking) a relationship between the two variables of interest.
In the follow examples, we will regress margin on total goals by using the lm() function. The first model you will run assumes a simple linear relationship between margin and goals; the second model you will run allows for the weather conditions at the ground to affect both the number of goals a team scores and the margin of the game.
summary(lm(margin~total_goals, data = grouped))
##
## Call:
## lm(formula = margin ~ total_goals, data = grouped)
##
## Residuals:
## Min 1Q Median 3Q Max
## -53.190 -17.445 -0.225 18.073 65.266
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -69.1788 9.5968 -7.208 7.42e-10 ***
## total_goals 6.4913 0.6627 9.796 2.01e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 24.96 on 65 degrees of freedom
## Multiple R-squared: 0.5962, Adjusted R-squared: 0.5899
## F-statistic: 95.95 on 1 and 65 DF, p-value: 2.012e-14
grouped_2 <- AFL_data_set %>%
group_by(Game_ID) %>%
summarise(total_goals=sum(Goals), margin=mean(Margin), wind=mean(Wind_mph), rainfall=mean(Rainfall_mm))
summary(lm(margin~total_goals+wind+rainfall, data = grouped_2))
##
## Call:
## lm(formula = margin ~ total_goals + wind + rainfall, data = grouped_2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -56.795 -17.069 1.109 18.009 62.065
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -80.26043 11.76090 -6.824 4.04e-09 ***
## total_goals 6.76716 0.68263 9.913 1.76e-14 ***
## wind 0.95891 0.60619 1.582 0.119
## rainfall 0.04296 0.60371 0.071 0.943
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 24.83 on 63 degrees of freedom
## Multiple R-squared: 0.6127, Adjusted R-squared: 0.5943
## F-statistic: 33.23 on 3 and 63 DF, p-value: 5.326e-13
As we can see in the above output, the coefficient on total_goals is both positive and statistically significant at the 1% level. This is not a great revelation, but it nonetheless makes sense: teams that kick more goals, win games by a greater margin.
As a quick aside, you may be wondering why the coefficient on total_goals is larger than six (i.e., the number of points a team scores when they kick a goal). I’ll leave it to you to work out why this might be the case - but I will provide a hint: because game time is finite, when a team scores a goal, this imposes an opportunity cost on the opposition…
In this tutorial, I will show you how to setup R, perform some basic commands, and explore the data set we discussed in class. This should put you in good stead to work through the exercise we are going to tackle next in class. This exercise will deal much more directly with many of the concepts related to performance management that you have discussed in this course.
Hopefully, this tutorial has also given you taste for the sort of analysis you can perform using R. As I stated earlier in class, R is great language that you learn on your own using free online resources. And, one final comment: What I have taught you has real-world applications; in fact, it is excatly the sort of basic ‘data science’ work that goes on in industry. As such, many employers are very keen to hire graduates for accounting and finance roles that have a solid grasp of R…